Goto

Collaborating Authors

 novel benchmark


Novel Benchmark for NER in the Wastewater and Stormwater Domain

Cardillo, Franco Alberto, Debole, Franca, Frontini, Francesca, Aelami, Mitra, Chahinian, Nanée, Conrad, Serge

arXiv.org Artificial Intelligence

The effective management of wastewater and stormwater systems is crucial for urban sustainability and environmental protection. These systems, which form an integral part of public infrastructure, require structured information for monitoring, planning, and maintenance. However, much of the relevant information exists in unstructured textual formats, such as technical reports, regulatory documents, and maintenance logs. Extracting information from these sources is a key challenge, due to domain-specific terminology and the multilingual nature of regulatory and operational contexts. Typically a wastewater management information extraction application will require domain-specific entity recognition, followed by the extraction of relations between entities to support decision-making, automated reasoning, and linking to existing knowledge bases. The recent progresses in domain-specific Named Entity Recognition (NER) have the potential to greatly facilitate the development of such applications. However, to effectively evaluate this first and crucial step of the extraction pipeline, it is essential to establish a clearly defined set of extractable entities and construct a multilingual benchmark corpus . Building on previous work - carried out within the framework of a national project on just one language - we propose the following contributions: The starwars corpus, an aligned French-Italian corpus containing domain-specific texts.


AuctionNet: A Novel Benchmark for Decision-Making in Large-Scale Games

Neural Information Processing Systems

Decision-making in large-scale games is an essential research area in artificial intelligence (AI) with significant real-world impact. However, the limited access to realistic large-scale game environments has hindered research progress in this area. In this paper, we present AuctionNet, a benchmark for bid decision-making in large-scale ad auctions derived from a real-world online advertising platform. AuctionNet is composed of three parts: an ad auction environment, a pre-generated dataset based on the environment, and performance evaluations of several baseline bid decision-making algorithms. More specifically, the environment effectively replicates the integrity and complexity of real-world ad auctions through the interaction of several modules: the ad opportunity generation module employs deep generative networks to bridge the gap between simulated and real-world data while mitigating the risk of sensitive data exposure; the bidding module implements diverse auto-bidding agents trained with different decision-making algorithms; and the auction module is anchored in the classic Generalized Second Price (GSP) auction but also allows for customization of auction mechanisms as needed.


From Text Segmentation to Smart Chaptering: A Novel Benchmark for Structuring Video Transcriptions

Retkowski, Fabian, Waibel, Alexander

arXiv.org Artificial Intelligence

Text segmentation is a fundamental task in natural language processing, where documents are split into contiguous sections. However, prior research in this area has been constrained by limited datasets, which are either small in scale, synthesized, or only contain well-structured documents. In this paper, we address these limitations by introducing a novel benchmark YTSeg focusing on spoken content that is inherently more unstructured and both topically and structurally diverse. As part of this work, we introduce an efficient hierarchical segmentation model MiniSeg, that outperforms state-of-the-art baselines. Lastly, we expand the notion of text segmentation to a more practical "smart chaptering" task that involves the segmentation of unstructured content, the generation of meaningful segment titles, and a potential real-time application of the models.


CoheSentia: A Novel Benchmark of Incremental versus Holistic Assessment of Coherence in Generated Texts

Maimon, Aviya, Tsarfaty, Reut

arXiv.org Artificial Intelligence

Coherence is a linguistic term that refers to the relations between small textual units (sentences, propositions), which make the text logically consistent and meaningful to the reader. With the advances of generative foundational models in NLP, there is a pressing need to automatically assess the human-perceived coherence of automatically generated texts. Up until now, little work has been done on explicitly assessing the coherence of generated texts and analyzing the factors contributing to (in)coherence. Previous work on the topic used other tasks, e.g., sentence reordering, as proxies of coherence, rather than approaching coherence detection heads on. In this paper, we introduce {\sc CoheSentia}, a novel benchmark of human-perceived coherence of automatically generated texts. Our annotation protocol reflects two perspectives; one is global, assigning a single coherence score, and the other is incremental, scoring sentence by sentence. The incremental method produces an (in)coherence score for each text fragment and also pinpoints reasons for incoherence at that point. Our benchmark contains 500 automatically-generated and human-annotated paragraphs, each annotated in both methods, by multiple raters. Our analysis shows that the inter-annotator agreement in the incremental mode is higher than in the holistic alternative, and our experiments show that standard LMs fine-tuned for coherence detection show varied performance on the different factors contributing to (in)coherence. All in all, these models yield unsatisfactory performance, emphasizing the need for developing more reliable methods for coherence assessment.


UIT-HWDB: Using Transferring Method to Construct A Novel Benchmark for Evaluating Unconstrained Handwriting Image Recognition in Vietnamese

Nguyen, Nghia Hieu, Vo, Duong T. D., Van Nguyen, Kiet

arXiv.org Artificial Intelligence

Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.